Audiovisual streaming in voicing perception: new evidence for a low-level interaction between audio and visual modalities
نویسندگان
چکیده
Speech Audio-visual (AV) interaction has been considered for redundancy and complementary properties at the phonetic level but a few experiments have shown a significant role in early auditory analysis. A new paradigm is proposed which uses the pre-voicing component (PVC) excised from a true /b/. When the so called target PVC is added up to a /p/ this leads to the clear perception of /b/. Moreover, the amplitude variation of the target PVC allows building of a perceptual continuum between /p/ when amplitude is set at 0 and /b/ at original amplitude. In the audio channel, adding a series of PVC at fixed low amplitude before and after the target allows the creation of a stream of regular sounds, which are not related to visible events. On the contrary, the bilabial aperture of the /p/ is a specific speech gesture visible in the video channel. The target PVC and the visible gesture are also not redundant events. Then, depending on its intensity level, the target PVC added to an audio /p/ could be either embedded to a stream of other PVCs or phonetically fused to perceive /b/. To study the competition between these two alternatives and the role of the AV interaction, we use a 2*2 factorial design to contrast Clear/Stream and Audio/AV conditions with a control of the amplitude of the target PVC. There is no stream of PVCs in the “Clear” condition for providing the baseline. The streaming effect by itself is significant in the audio condition, but the novelty is that we find a strong AV interaction. When a stream of PVCs is present, in the “AV” condition, the rate of perceived /p/ is higher than in the “Audio” condition, suggesting that the video lip opening gestures increases the trend to isolate the formant trajectory towards the vowel from the PVC, hence increasing the perception of unvoiced stimuli. We conclude that the process of low level audio streaming is reinforced when the visual information is not redundant, and that, in this case, the phonetic fusion of the voicing cue is disadvantaged by visual information.
منابع مشابه
Distorted visual information influences audiovisual perception of voicing
Research has shown that visual information becomes less reliable when images are severely distorted. Furthermore, while voicing is generally identified from acoustical cues, it may also provide perception with visual cues. The current study investigated the impact of video distortion on the audiovisual perception of voicing. Audiovisual stimuli were presented to 30 participants with the origina...
متن کاملOn the tip of the tongue: Modulation of the primary motor cortex during audiovisual speech perception
Recent neurophysiological studies show that cortical brain regions involved in the planning and execution of speech gestures are also activated in processing speech sounds. These findings suggest that speech perception is in part mediated by reference to the motor actions afforded in the speech signal. Since interactions between auditory and visual modalities are beneficial in speech perception...
متن کاملChanges in audio-visual speech perception during adulthood
Audiovisual speech perception research has shown an increasing use of visual information from infancy to young adulthood. The current study extends these findings by examining audiovisual speech perception from young adulthood to mid-adulthood by addressing the extent to which audio, visual and audiovisual cues are used for place of articulation identification. Responses were gathered with youn...
متن کاملNo Visual Mismatch Negativity (MMN) for Simultaneously Presented Audiovisual Stimuli: Evidence from Human Brain Processing
The present study employed simultaneous audiovisual stimuli in the oddball paradigm to re-examine the effects of attention on audio, visual and audiovisual perception. The study was designed to investigate whether task-related processing of audio and visual features was independent or task-related processing in one modality might influence the processing of the other. Electroencephalogram (EEG)...
متن کاملPerception of Prominence Intensity in audio-visual Speech
Multimodal prosody carries a wide variety of information Here, we investigated the roles of visual and the auditory information in the production and perception of different emphasis intensities. In a series of video recordings, the intensity, location, and syntactic category of the emphasized word were varied. Physical analyses demonstrated that each speaker produced different emphasis intensi...
متن کامل